Voice controlled computer interface

ABSTRACT

Voice utterances are substituted for manipulation of a pointing device, the pointing device being of the kind which is manipulated to control motion of a cursor on a computer display and to indicate desired actions associated with the position of the cursor on the display, the cursor being moved and the desired actions being aided by an operating system in the computer in response to control signals received from the pointing device, the computer also having an alphanumeric keyboard, the operating system being separately responsive to control signals received from the keyboard in accordance with a predetermined format specific to the keyboard; in the system, a voice recognizer recognizes the voiced utterance, and an interpreter converts the voiced utterance into control signals which will directly create a desired action aided by the operating system without first being converted into control signals expressed in the predetermined format specific to the keyboard. In another aspect, voiced utterances are converted to commands, expressed in a predefined command language, to be used by an operating system of a computer, by converting some voiced utterances into commands corresponding to actions to be taken by the operating system, and converting other voiced utterances into commands which carry associated text strings to be used as part of text being processed in an application program running under the operating system. In another aspect, a table is generated for aiding the conversion of voiced utterances to commands for use in controlling an operating system of a computer to achieve desired actions in an application program running under the operating system, the application program including menus and control buttons; the instruction sequence of the application program is parsed to identify menu entries and control buttons, and an entry is included in the table for each menu entry and control button found in the application program, each entry in the table containing a command corresponding to the menu entry or control button. In another aspect, a user is enabled to create an instance in a formal language of the kind which has a strictly defined syntax; a graphically displayed list of entries are expressed in a natural language which does not comply with the syntax, the user is permitted to point to an entry on the list, and the instance corresponding to the identified entry in the list is automatically generated in response to the pointing.

BACKGROUND OF THE INVENTION

[0001] This invention relates to voice controlled computer interfaces.

[0002] Voice recognition systems can convert human speech into computerinformation. Such voice recognition systems have been used, for example,to control text-type user interfaces, e.g., the text-type interface ofthe disk operating system (DOS) of the IBM Personal Computer.

[0003] Voice control has also been applied to graphical user interfaces,such as the one implemented by the Apple Macintosh computer, whichincludes icons, pop-up windows, and a mouse. These voice control systemsuse voiced commands to generate keyboard keystrokes.

SUMMARY OF THE INVENTION

[0004] In general, in one aspect, the invention features enabling voicedutterances to be substituted for manipulation of a pointing device, thepointing device being of the kind which is manipulated to control motionof a cursor on a computer display and to indicate desired actionsassociated with the position of the cursor on the display, the cursorbeing moved and the desired actions being aided by an operating systemin the computer in response to control signals received from thepointing device, the computer also having an alphanumeric keyboard, theoperating system being separately responsive to control signals receivedfrom the keyboard in accordance with a predetermined format specific tothe keyboard; a voice recognizer recognizes the voiced utterance, and aninterpreter converts the voiced utterance into control signals whichwill directly create a desired action aided by the operating systemwithout first being converted into control signals expressed in thepredetermined format specific to the keyboard.

[0005] In general, in another aspect of the invention, voiced utterancesare converted to commands, expressed in a predefined command language,to be used by an operating system of a computer, converting some voicedutterances into commands corresponding to actions to be taken by saidoperating system, and converting other voiced utterances into commandswhich carry associated text strings to be used as part of text beingprocessed in an application program running under the operating system.

[0006] In general, in another aspect, the invention features generatinga table for aiding the conversion of voiced utterances to commands foruse in controlling an operating system of a computer to achieve desiredactions in an application program running under the operating system,the application program including menus and control buttons; theinstruction sequence of the application program is parsed to identifymenu entries and control buttons, and an entry is included in the tablefor each menu entry and control button found in the application program,each entry in the table containing a command corresponding to the menuentry or control button.

[0007] In general, in another aspect, the invention features enabling auser to create an instance in a formal language of the kind which has astrictly defined syntax; a graphically displayed list of entries areexpressed in a natural language and do not comply with the syntax, theuser is permitted to point to an entry on the list, and the instancecorresponding to the identified entry in the list is automaticallygenerated in response to the pointing.

[0008] The invention enables a user to easily control the graphicalinterface of a computer. Any actions that the operating system can becommanded to take can be commanded by voiced utterances. The commandsmay include commands that are normally entered through the keyboard aswell as commands normally entered through a mouse or any other inputdevice. The user may switch back and forth between voiced utterancesthat correspond to commands for actions to be taken and voicedutterances that correspond to text strings to be used in an applicationprogram without giving any indication that the switch has been made. Anyapplication may be made susceptible to a voice interface byautomatically parsing the application instruction sequence for menus andcontrol buttons that control the application.

[0009] Other advantages and features will become apparent from thefollowing description of the preferred embodiment and from the claims.

DESCRIPTION OF THE PREFERRED EMBODIMENT

[0010] We first briefly describe the drawings.

[0011]FIG. 1 is a functional block diagram of a Macintosh computerserved by a Voice Navigator voice controlled interface system.

[0012]FIG. 2A is a functional block diagram of a Language Maker systemfor creating word lists for use with the Voice Navigator interface ofFIG. 1.

[0013]FIG. 2B depicts the format of the voice files and word lists usedwith the Voice Navigator interface.

[0014]FIG. 3 is an organizational block diagram of the Voice Navigatorinterface system.

[0015]FIG. 4 is a flow diagram of the Language Maker main event loop.

[0016]FIG. 5 is a flow diagram of the Run Edit module.

[0017]FIG. 6 is a flow diagram of the Record Actions submodule.

[0018]FIG. 7 is a flow diagram of the Run Modal module.

[0019]FIG. 8 is a flow diagram of the In Button? routine.

[0020]FIG. 9 is a flow diagram of the Event Handler module.

[0021]FIG. 10 is a flow diagram of the Do My Menu module.

[0022]FIGS. 11A through 11I are flow diagrams of the Language Maker menusubmodules.

[0023]FIG. 12 is a flow diagram of the Write Production module.

[0024]FIG. 13 is a flow diagram of the Write Terminal submodule.

[0025]FIG. 14 is a flow diagram of the Voice Control main driver loop.

[0026]FIG. 15 is a flow diagram of the Process Input module.

[0027]FIG. 16 is a flow diagram of the Recognize submodule.

[0028]FIG. 17 is a flow diagram of the Process Voice Control Commandsroutine.

[0029]FIG. 18 is a flow diagram of the ProcessQ module.

[0030]FIG. 19 is a flow diagram of the Get Next submodule.

[0031]FIG. 20 is a chart of the command handlers.

[0032]FIGS. 21A through 21G are flow diagrams of the command handlers.

[0033]FIG. 22 is a flow diagram of the Post Mouse routine.

[0034]FIG. 23 is a flow diagram of the Set Mouse Down routine.

[0035]FIGS. 24 and 25 illustrate the screen displays of Voice Control.

[0036]FIGS. 26 through 29 illustrate the screen displays of LanguageMaker.

[0037]FIG. 30 is a listing of a language file.

SYSTEM OVERVIEW

[0038] Referring to FIG. 1, in an Apple Macintosh computer 100, aMacintosh operating system 132 provides a graphical interactive userinterface by processing events received from a mouse 134 and a keyboard136 and by providing displays including icons, windows, and menus on adisplay device 138. Operating system 132 provides an environment inwhich application programs such as Macwrite 139, desktop utilities suchas Calculator 137, and a wide variety of other programs can be run.

[0039] The operating system 132 also receives events from the VoiceNavigator voice controlled computer interface 102 to enable the user tocontrol the computer by voiced utterances. For this purpose, the userspeaks into a microphone 114 connected via a Voice Navigator box 112 tothe SCSI (Small Computer Systems Interface) port of the computer 100.The Voice Navigator box 112 digitizes and processes analog audio signalsreceived from a microphone 114, and transmits processed digitized audiosignals to the Macintosh SCSI port. The Voice Navigator box includes ananalog-to-digital converter (A/D) for digitizing the audio signal, a DSP(Digital Signal Processing) chip for compressing the resulting digitalsamples, and protocol interface hardware which configures the digitalsamples to obey the SCSI protocols.

[0040] Recognizer Software 120 (available from Dragon Systems, Newton,Mass.) runs under the Macintosh operating system, and is controlled byinternal commands 123 received from Voice Control driver 128 (which alsooperates under the Macintosh operating systems. One possible algorithmfor implementing Recognizer Software 120 is disclosed by Baker et al, inU.S. Pat. No. 4,783,803, incorporated by reference herein. RecognizerSoftware 120 processes the incoming compressed, digitized audio, andcompares each utterance of the user to prestored utterance macros. Ifthe user utterance matches a prestored utterance macro, the utterance isrecognized, and a command string 121 corresponding to the recognizedutterance is delivered to a text buffer 126. Command strings 121delivered from the Recognizer Software represent commands to be issuedto the Macintosh operating system (e.g., menu selections to be made ortext to be displayed), or internal commands 123 to be issued by theVoice Control driver.

[0041] During recognition, the Recognizer Software 120 compares theincoming samples of an utterance with macros in a voice file 122. (Thesystem requires the user to space apart his utterances briefly so thatthe system can recognize when each utterance ends.) The voice filemacros are created by a “training” process, described below. If a matchis found (as judged by the recognition algorithm of the RecognizerSoftware 120), a Voice Control command string from a word list 124(which has been directly associated with voice file 122) is fetched andsent to text buffer 126.

[0042] The command strings in text buffer 126 are relayed to VoiceControl driver 128, which drives a Voice Control interpreter 130 inresponse to the strings.

[0043] A command string 121 may indicate an internal command 123, suchas a command to the Recognizer Software to “learn” new voice filemacros, or to adjust the sensitivity of the recognition algorithm. Inthis case, Voice Control interpreter 130 sends the appropriate internalcommand 123 to the Recognizer Software 120. In other cases, the commandstring may represent an operating system manipulation, such as a mousemovement. In this case, Voice Control interpreter 130 produces theappropriate action by interacting with the Macintosh operating system132.

[0044] Each application or desktop accessory is associated with a wordlist 124 and a corresponding voice file 122; these are loaded by theRecognition Software when the application or desktop accessory isopened.

[0045] The voice files are generated by the Recognizer Software 120 inits “learn” mode, under the control of internal commands from the VoiceControl driver 128.

[0046] The word lists are generated by the Language Maker desktopaccessory 140, which creates “languages” of utterance names andassociated Voice Control command strings, and converts the languagesinto the word lists. Voice Control command strings are strings such as“ESC”, “TEXT”, “@MENU(font,2)”, and belong to a Voice Control commandset, the syntax of which will be described later and is set forth inAppendix A.

[0047] The Voice Control and Language Maker software includes about30,000 lines of code, most of which is written in the C language, theremainder being written in assembly language. A listing of the VoiceControl and Language Maker software is provided in microfiche asappendix C. The Voice Control software will operate on a Macintosh Plusor later models, configured with a minimum of 1 Mbyte RAM (2 Mbyte forHyperCard and other large applications), a Hard Disk, and with Macintoshoperating system version 6.01 or later.

[0048] In order to understand the interaction of the Voice Controlinterpreter 130 and the operating system, note that Macintosh operatingsystem 132 is “event driven”. The operating system maintains an eventqueue (not shown); input devices such as the mouse 134 or the keyboard136 “post” events to this queue to cause the operating system to, forexample, create the appropriate text entry, or trigger a mouse movement.The operating system 132 then, for example, passes messages to Macintoshapplications (such as MacWrite 139) or to desktop accessories (such asCalculator 137) indicating events on the queues (if any). In one mode ofoperation, Voice Control interpreter 130 likewise controls the operatingsystem (and hence the applications and desktop accessories which arecurrently running) by posting events to the operating system queues. Theevents posted by the Voice Control interpreter typically correspond tomouse activity or to keyboard keystrokes, or both, depending upon thevoice commands. Thus, the Voice Navigator system 102 provides anadditional user interface. In some cases, the “voice” events maycomprise text strings to be displayed or included with text beingprocessed by the application program.

[0049] At any time during the operation of the Voice Navigator system,the Recognizer Software 120 may be trained to recognize an utterance ofa particular user and to associate a corresponding text string with eachutterance. In this mode, the Recognizer Software 120 displays to theuser a menu of the utterance names (such as “file”, “page down”) whichare to be recognized. These names, and the corresponding Voice Controlcommand strings (indicating the appropriate actions) appear in a currentword list 124. The user designates the utterance name of interest andthen is prompted to speak the utterance corresponding to that name. Forexample, if the utterance name is “file”, the user might utter “FILE” or“PLEASE FILE”. The digitized samples from the Voice Navigator box 112corresponding to that utterance are then used by the Recognizer Software120 to create a “macro” representing the utterance, which is stored inthe voice file 122 and subsequently associated with the utterance namein the word list 124. Ordinarily, the utterance is repeated more thanonce, in order to create a macro for the utterance that accommodatesvariation in a particular speaker's voice.

[0050] The meaning of the spoken utterance need not correspond to theutterance name, and the text of the utterance name need not correspondto the Voice Control command strings stored in the word list. Forexample, the user may wish a command string that causes the operatingsystem to save a file to have the utterance name “save file”; theassociated command string may be “@MENU(file,2)”; and the utterance thatthe user trains for this utterance name may be the spoken phrase“immortalize”. The Recognizer Software and Voice Control cause thatutterance, name, and command string to be properly associated in thevoice file and word list 124.

[0051] Referring to FIG. 2A, the word lists 124 used by the VoiceNavigator are created by the Language Maker desk accessory 140 runningunder the operating system. Each word list 124 is hierarchical, that is,some utterance names in the list link to sub-lists of other utterancenames. Only the list of utterance names at a currently active level ofthe hierarchy can be recognized. (In the current embodiment, the numberof utterance names at each level of the hierarchy can be as large as1000.) In the operation of Voice Control, some utterances, such as“file”, may summon the file menu on the screen, and link to a subsequentlist of utterance names at a lower hierarchical level. For example, thefile menu may list subsequent commands such as “save”, “open”, or “saveas”, each associated with an utterance.

[0052] Language Maker enables the user to create a hierarchical languageof utterance names and associated command strings, re-arrange thehierarchy of the language, and add new utterance names. Then, when thelanguage is in the form that the user desires, the language is convertedto a word list 124. Because the hierarchy of the utterance names andcommand strings can be adjusted, when using the Voice Navigator systemthe user is not bound by the preset menu hierarchy of an application.For example, the user may want to create a “save” command at the toplevel of the utterance hierarchy that directly saves a file withoutfirst summoning the file menu. Also, the user may, for example, create anew utterance name “goodbye”, that saves a file and exits all at once.

[0053] Each language created by Language Maker 140 also contains thecommand strings which represent the actions (e.g. clicking the mouse ata location, typing text on the screen) to be associated with utterancesand utterance names. In order for the training of the Voice Navigatorsystem to be more intuitive, the user does not specify the commandstrings to describe the actions he wishes to be associated with anutterance and utterance name. In fact, the user does not need to knowabout, and never sees, the command strings stored in the Language Makerlanguage or the resulting word list 124.

[0054] In a “record” mode, to associate a series of actions with anutterance name, the user simply performs the desired actions (such astyping the text at the keyboard, or clicking the mouse at a menu). Theactions performed are converted into the appropriate command strings,and when the user turns off the record mode, the command strings areassociated with the selected utterance name.

[0055] While using Language Maker, the user can cause the creation of alanguage by entering utterance names by typing the names at the keyboard142, by using a “create default text” procedure 146 (to parse a textfile on the clipboard, in which case one utterance name is created foreach word in the text file, and the names all start at the samehierarchical level), or by using a “create default menus” procedure (toparse the executable code 144 for an application, and create a set ofutterance names which equal the names of the commands in the menus ofthe application, in which case the initial hierarchy for the names isthe same as the hierarchy of the menus in the application).

[0056] If the names are typed at the keyboard or created by parsing atext file, the names are initially associated with the keystrokes which,when typed at the keyboard, produce the name. Therefore, the name “text”would be initially be associated with the keystrokes t-e-x-t. If thenames are created by parsing the executable code 144 for an application,then the names are initially associated with the command strings whichexecute the corresponding menu commands for the application. Theseinitial command strings can be changed by simply selecting the utterancename to be changed and putting Language Maker into record mode.

[0057] The output of Language Maker is a language file 148. This filecontains the utterance names and the corresponding command strings. Thelanguage file 148 is formatted for input to a VOCAL compiler 150(available from Dragon Systems), which converts the language file into aword list 124 for use with the Recognition Software. The syntax oflanguage files is specified in the Voice Navigator Developer's ReferenceManual, provided as Appendix D, and incorporated by reference.

[0058] Referring to FIG. 2B, a macro 147 of each learned utterance isstored in the voice file 122. A corresponding utterance name 149 andcommand string 151 are associated with one another and with theutterance and are stored in the word list 124. The word list 124 iscreated and modified by Language Maker 140, and the voice file 122 iscreated and modified by the Recognition Software 120 in its learn mode,under the control of the Voice Control driver 128.

[0059] Referring to FIG. 3, in the Voice Navigator system 102, the VoiceNavigator hardware box 152 includes an analog-to-digital (A/D) converter154 for converting the analog signal from the microphone into a digitalsignal for processing, a DSP section 156 for filtering and compactingthe digitized signal, a SCSI manager 158 for communication with theMacintosh, and a microphone control section 160 for controlling themicrophone.

[0060] The Voice Navigator system also includes the Recognition Softwarevoice drivers 120 which include routines for utterance detection 164 andcommand execution 166. For utterance detection 164, the voice driversperiodically poll 168 the Voice Navigator hardware to determine if anutterance is being received by Voice Navigator box 152, based on theamplitude of the signal received by the microphone. When an utterance isdetected 170, the voice drivers create a speech buffer of encodeddigital samples (tokens) to be used by the command execution drivers166. On command 166 from the Voice Control driver 128, the recognitiondrivers can learn new utterances by token-to-terminal conversion 174.The token is converted to a macro for the utterance, and stored as aterminal in a voice file 122 (FIG. 1).

[0061] Recognition and pattern matching 172 is also performed on commandby the voice drivers. During recognition, a stored token of incomingdigitized samples is compared with macros for the utterances in thecurrent level of the recognition hierarchy. If a match is found,terminal to output conversion 176 is also performed, selecting thecommand string associated with the recognized utterance from the wordlist 124 (FIG. 1). State management 178, such as changing of sensitivitycontrols, is also performed on command by the voice drivers.

[0062] The Voice Control driver 128 forms an interface 182 to the voicedrivers 120 through control commands, an interface 184 to the Macintoshoperating system 132 (FIG. 1) through event posting and operating systemhooks, and an interface 186 to the user through display menus andprompts.

[0063] The interface 182 to the drivers allows Voice Control access tothe Voice Driver command functions 166. This interface allows VoiceControl to monitor 188 the status of the recognizer, for example tocheck for an utterance token in the utterance queue buffered 170 to theMacintosh. If there is an utterance, and if processor time is available,Voice Control issues command sdi_recognize 190, calling the recognitionand pattern match routine 172 in the voice drivers. In addition, theinterface to the drivers may issue command sdi_output 192 which controlsthe terminal to output conversion routine 176 in the voice drivers,converting a recognized utterance to an command string for use by VoiceControl. The command string may indicate mouse or keystroke events to beposted to the operating system, or may indicate commands to VoiceControl itself (e.g. enabling or disabling Voice Control).

[0064] From the user's perspective, Voice Control is simply a Macintoshdriver with internal parameters, such as sensitivity, and internalcommands, such as commands to learn new utterances. The actualprocessing which the user perceives as Voice Control may actually beperformed by Voice Control, or by the Voice Drivers, depending upon thefunction. For example, the utterance learning procedures are performedby the Voice Drivers under the control of Voice Control.

[0065] The interface 184 to the Macintosh operating system allows VoiceControl, where appropriate, to manipulate the operating system (e.g., byposting events or modifying event queues). The macro interpreter 194takes the command strings delivered from the voice drivers via the textbuffer and interprets them to decide what actions to take. Thesecommands may indicate text strings to be displayed on the display ormouse movements or menu selections to be executed.

[0066] In the interpretive execution of the command strings, VoiceControl must manipulate the Macintosh event queues. This task isperformed by OS event management 196. As discussed above, voice eventsmay simulate events which are ordinarily associated with the keyboard orwith the mouse. Keyboard events are handled by OS event management 196directly. Mouse events are handled by mouse handler 198. Mouse eventsrequire an additional level of handling because mouse events can requireoperating system manipulation outside of the standard event postroutines which are accomplished by the OS event management 196.

[0067] The main interface into-the-Macintosh operating system 132 isevent based, and is used in the majority of the commands which are voicerecognized and issued to the Macintosh. However, there are other “hooks”to the operating system state which are used to control parameters suchas mouse placement and mouse motion. For example, as will be discussedlater, pushing the mouse button down generates an event, however,keeping the mouse button pushed down and dragging the mouse across amenu requires the use of an operating system hook. For reference, theoperating system hooks used by the voice Navigator are listed inAppendix B.

[0068] The operating system hooks are implemented by the trap filters200, which are filters used by Voice Control to force the Macintoshoperating system to accept the controls implemented by OS eventmanagement 196 and mouse handler 198.

[0069] The Macintosh operating system traps are held in Macintosh readonly memories (ROMs), and implement high level commands for controllingthe system. Examples of these high level commands are: drawing a stringonto the screen, window zooming, moving windows to the front and back ofthe screen, and polling the status of the mouse button. In order for theVoice Control driver to properly interface with the Macintosh operatingsystem it must control these operating system traps to generate theappropriate events.

[0070] To generate menu events, for example, Voice Control “seizes” themenu select trap (i.e. takes control of the trap from the operatingsystem). Once Voice Control has seized the trap, application requestsfor menu selections are forwarded to Voice Control. In this way VoiceControl is able to modify, where necessary, the operating system outputto the program, thereby controlling the system behavior as desired.

[0071] The interface 186 to the user provides user control of the VoiceControl operations. Prompts 202 display the name of each recognizedutterance on the Macintosh screen so that the user may determine if theproper utterance has been recognized. On-line training 204 allows theuser to access, at any time while using the Macintosh, the utterancenames in the word list 124 currently in use. The user may see whichutterance names have been trained and may retrain the utterance names inan on-line manner (these functions require Voice Control to use theVoice Driver interface, as discussed above). User options 206 provideselection of various Voice Control settings, such as the sensitivity andconfidence level of the recognizer (i.e., the level of certaintyrequired to decide that an utterance has been recognized). The optimalvalues for these parameters depend upon the microphone in use and thespeaking voice of the user.

[0072] The interface 186 to the user does not operate via the Macintoshevent interface. Rather, it is simply a recursive loop which controlsthe Recognition Software and the state of the Voice Control driver.

[0073] Language Maker 140 includes an application analyzer 210 and anevent recorder 212. Application analyzer 210 parses the executable codeof applications as discussed above, and produces suitable defaultutterance names and pre-programmed command strings. The applicationanalyzer 210 includes a menu extraction procedure 214 which searchesexecutable code to find text strings corresponding to menus. Theapplication analyzer 210 also includes control identification procedures216 for creating the command strings corresponding to each menu item inan application.

[0074] The event recorder 212 is a driver for recording user commandsand creating command strings for utterances. This allows the user toeasily create and edit command strings as discussed above.

[0075] Types of events which may be entered into the event recorderinclude: text entry 218, mouse events 220 (such as clicking at aspecified place on the screen), special events 222 which may benecessary to control a particular application, and voice events 224which may be associated with operations of the Voice Control driver.

Language Maker

[0076] Referring to FIG. 4, the Language Maker main event loop 230 issimilar in structure to main event loops used by other desk accessoriesin the Macintosh operating system. If a desk accessory is selected fromthe “Apple” menu, an “open” event is transmitted to the accessory. Ingeneral, if the application in which it resides quits or if the userquits it using its menus, a “close” event is transmitted to theaccessory. Otherwise, the accessory is transmitted control events. Themessage parameter of a control event indicates the kind of event. Asseen in FIG. 4, the Language Maker main event loop 230 begins with ananalysis 232 of the event type.

[0077] If the event is an open event Language Maker tests 234 whether itis already opened. If Language Maker is already opened 236, the currentlanguage (i.e. the list of utterance names from the current word list)is displayed and Language Maker returns 237 to the operating system. IfLanguage Maker is not open 238, it is initialized and then returns 239to the operating system.

[0078] If the event is a close event, Language Maker prompts the user240 to save the current language as a language file. If the usercommands Language Maker to save the current language, the currentlanguage is converted by the Write Production module 242 to a languagefile, and then Language Maker exits 244. If the current language is notsaved, Language Maker exits directly.

[0079] If the event is a control event 246, then the way in whichLanguage Maker responds to the event depends upon the mode that LanguageMaker is in, because Language Maker has a utility for recording events(i.e. the mouse movements and clicks or text entry that the user wishesto assign to an utterance), and must record events which do not involvethe Language Maker window. However, when not recording, Language Makershould only respond to events in its window. Therefore, Language Makermay respond to events in one mode but not in another.

[0080] A control event 246 is forwarded to one of three branches 248,250, 252. All menu events are forwarded to the accMenu branch 252. (Onlymenu events occurring in desk accessory menus will be forwarded toLanguage Maker.) All window events for the Language Maker window areforwarded to the accEvent branch 250. All other events received byLanguage Maker, which correspond to events for desktop accessories orapplications other than Language Maker, initiate activity in the accRunbranch 248, to enable recording of actions.

[0081] In the accRun branch 248, events are recorded and associated withthe selected utterance name. Before any events are recorded LanguageMaker checks 254 if Language Maker is recording; if not, Language Makerreturns 256. If recording is on 258, then Language Maker checks thecurrent recording mode.

[0082] While recording, Language Maker seizes control of the operatingsystem by setting control flags that cause the operating system to callLanguage Maker every tick of the Macintosh (i.e. every {fraction (1/60)}second).

[0083] If the user has set Language Maker in dialog mode, Language Makercan record dialog events (i.e. events which involve modal dialog, wherethe user cannot do anything except respond to the actions in modaldialog boxes). To accomplish this, the user must be able to produceactions (i.e. mouse clicks, menu selections) in the current applicationso that the dialog boxes are prompted to the screen. Then the user caninitialize recording and respond to the dialog boxes. When modal dialogboxes should be produced, events received by Language Maker are alsoforwarded to the operating system. Otherwise, events are not forwardedto the operating system. Language Maker's modal dialog recording isperformed by the Run Modal module 260.

[0084] If modal dialog events are not being recorded, the user recordswith Language Maker in “action” mode, and Language Maker proceeds to theRun Edit module 262.

[0085] In the accEvent branch, all events are forwarded to the EventHandler module 264.

[0086] In the accMenu branch, the menu indicated by the desk accessorymenu event is checked 266. If the event occurred in the Language Makermenu, it is forwarded to the Do My Menu module 268. Other events areignored 270.

[0087] Referring to FIG. 5, the Run Edit module 262 performs a loop 272,274. Each action is recorded by the Record Actions submodule 272. Ifthere are more actions in the event queue then the loop returns to theRecord Actions submodule. If a cancel action appears 276 in the eventqueue then Run Edit returns 277 without updating the current language inmemory. Otherwise, if the events are completed successfully, run editupdates the language in memory and turns off recording 278 and returnsto the operating system 280.

[0088] Referring to FIG. 6, in the Record Actions submodule 272, actionsperformed by the user in record mode are recorded. When the currentapplication makes a request for the next event on the event queue, theevent is checked by record actions. Each non-null event (i.e. eachaction) is processed by Record Actions. First, the type of action ischecked 282. If the action selects a menu 284, then the selected menu isrecorded. If the action is a mouse click 286, the In Button? routine(see FIG. 8) checks if the click occurred inside of a button (a buttonis a menu selection area in the front window) or not. If so, the buttonis recorded 288. If not, the location of the click is recorded 290.

[0089] Other actions are recorded by special handlers. These actionsinclude group actions 292, mouse down actions 294, mouse up actions 296,zoom actions 298, grow actions 300, and next window actions 302.

[0090] Some actions in menus can create pop-up menus with subchoices.These actions are handled by popping up the appropriate pop-up menu sothat the user may select the desired subchoice. Move actions 304, pauseactions 306, scroll actions 308, text actions 310 and voice actions 312pop up respective menus and Record Actions checks 314 for the menuselection made by the user (with a mouse drag). If no menu selection ismade, then no action is recorded 316. Otherwise, the choice is recorded318.

[0091] Other actions may launch applications. In this case 320 theselected application is determined. If no application has been selectedthen no action is recorded 322, otherwise the selected application isrecorded 324.

[0092] Referring to FIG. 7, the Run Modal procedure 260 allows recordingof the modal dialogs of the Macintosh computer. During modal dialogs,the user cannot do anything except respond to the actions in the modaldialog box. In order to record responses to those actions, Run Modal hasseveral phases, each phase corresponding to a step in the recordingprocess.

[0093] In the first phase, when the user selects dialog recording, RunModal prompts the user with a Language Maker dialog box that gives theuser the options “record” and “cancel” (see FIG. 25). The user may theninteract with the current application until arriving at the dialog clickthat is to be recorded. During this phase, all calls to Run Modal arerouted through Select Dialog 326, which produces the initial LanguageMaker dialog box, and then returns 327, ignoring further actions.

[0094] To enter the second, recording, phase, the user clicks on the“record” button in the Language Maker dialog box, indicating that thefollowing dialog responses are to be recorded. In this phase, calls toRun Modal are routed to Record 328, which uses the In Button? routine330 to check if a button in current application's dialog box has beenselected. If the click occurred in a button, then the button is recorded332, and Run Modal returns 333. Otherwise, the location of the click isrecorded 334 and Run Modal returns 335.

[0095] Finally, when all clicks are recorded, the user clicks on the“cancel” button in the Language Maker dialog box, entering the thirdphase of the recording session. The click in the “cancel” button causesRun Modal to route to Cancel 336, which updates 338 the current languagein memory, then returns 340.

[0096] Referring to FIG. 8, the In Button? procedure 286 determineswhether a mouse click event occurred on a button. In Button? gets thecurrent window control list 342 (a Macintosh global which contains thelocations of all of the button rectangles in the current window, referto Appendix B) from the operating system and parses the list with a loop344-350. Each control is fetched 350, and then the rectangle of thecontrol is found 346. Each rectangle is analyzed 348 to determine if theclick occurred in the rectangle. If not, the next control is fetched350, and the loop recurses. If, 344, the list is emptied, then the clickdid not occur on a button, and no is returned 352. However, if the clickdid occur in a rectangle, then, if, 351, the rectangle is named, theclick occurred on a button, and yes is returned 354; if the rectangle isnot named 356, the click did not occur on a button, and no is returned356.

[0097] Referring to FIG. 9, the Event Handler module 264 deals withstandard Macintosh events in the Language Maker display window. TheLanguage Maker display window lists the utterance names in the currentlanguage. As shown in FIG. 9, Event Handler determines 358 whether theevent is a mouse or keyboard event and subsequently performs the properaction on the Language Maker window.

[0098] Mouse events include: dragging the window 360, growing the window362, scrolling the window 364, clicking on the window 368 (which selectsan utterance name), and dragging on the window 370 (which moves anutterance name from one location on the screen to another, potentiallychanging the utterance's position in the language hierarchy).Double-clicking 366 on an utterance name in the window selects thatutterance name for action recording, and therefore starts the Run Editmodule.

[0099] Keyboard events include the standard cut 372, copy 374, and paste376 routines, as well as cursor movements down 380, up 382, right 384,and left 386. Pressing return at the keyboard 378, as with a doubleclick at the mouse, selects the current utterance name for actionrecording by Run Edit. After the appropriate command handler is called,Event Handler returns 388. The modifications to the language hierarchyperformed by the Event Handler module are reflected in hierarchicalstructure of the language file produced by the Write Production moduleduring close and save operations.

[0100] Referring to FIG. 10, the Do My Menu module 268 controls all ofthe menu choices supported by Language Maker. After summoning theappropriate submodule (discussed in detail in FIGS. 11A through 11I), DoMy Menu returns 408.

[0101] Referring to FIG. 11A, the New submodule 390 creates a newlanguage. The New submodule first checks 410 if Language Maker is open.If so, it prompts the user 412 to save the current language as alanguage file. If the user saves the current language, New calls WriteProduction module 414 to save the language. New then calls Create GlobalWords 416 and forms a new language 418. Create Global Words 416 willautomatically enter a few global (i.e. resident in all languages)utterance names and command strings into the new language. Theseutterance names and command strings allow the user to make Voice Controlcommands, and correspond to utterances such as “show me the activewords” and “bring up the voice options” (the utterance macros for thecorresponding voice file are trained by the user, or copied from anexisting voice file, after the new language is saved).

[0102] Referring to FIG. 11B, the Open submodule 392 opens an existinglanguage for modification. The Open submodule 392 checks 420 if LanguageMaker is open. If so, it prompts the user 422 to save the currentlanguage, calling Write Production 424 if yes. Open then prompts theuser to open the selected language 426. If the user cancels, Openreturns 428. Otherwise, the language is loaded 430 and Open returns 432.

[0103] Referring to FIG. 11C, the Save submodule 394 saves the currentlanguage in memory as a language file. Save prompts the user to save thecurrent language 434. If the user cancels, Save returns 436, otherwise,Save calls Write Production 438 to convert the language into a statemachine control file suitable for use by VOCAL (FIG. 2). Finally, Savereturns 440.

[0104] Referring to FIG. 11D, the New Action submodule 396 initializesthe event recorders to begin recording a new sequence of actions. NewAction initializes the event recorder by displaying an action window tothe user 442, setting up a tool palette for the user to use, andinitializing recording of actions. Then New Action returns 444. AfterNew Action is started, actions are not delivered to the operating systemdirectly; rather they are filtered through Language Maker.

[0105] Referring to FIG. 11E, the Record Dialog submodule 398 recordsresponses to dialog boxes through the use of the Run Modal module.Record Dialog 398 gives the user a way to record actions in modaldialog; otherwise the user would be prevented from performing theactions which bring up the dialog boxes. Record Dialog displays 446 thedialog action window (see FIG. 25) and turns recording on. Then RecordDialog returns 448.

[0106] Referring to FIG. 11F, the Create Default Menus submodule 400extracts default utterance names (and generates associated commandstrings) from the executable code for an application. Create DefaultMenus 270 is ordinarily the first choice selected by a user whencreating a language for a particular application. This submodule looksat the executable code of an application and creates an utterance namefor each menu command in the application, associating the utterance namewith a command string that will select that menu command. When called,Create Default Menus gets 450 the menu bar from the executable code ofthe application, and initializes the current menu to be the first menu(X=1). Next, each menu is processed recursively. When all menus areprocessed, Create Default Menus returns 454. A first loop 452, 456, 458,460 locates the current (X^(th)) menu handle 456, initializes menuparsing, checks if the current menu is fully parsed 458, and reiteratesby updating the current menu to the next menu. A second loop 458, 462,464 finds each menu name 462, and checks 464 if the name is hierarchical(i.e. if the name points to further menus). If the names are nothierarchical, the loop recurses. Otherwise, the hierarchical menu isfetched 466, and a third loop 470, 472 starts. In the third loop, eachitem name in the hierarchical menu is fetched 472, and the loop checksif all hierarchical item names have been fetched 470.

[0107] Referring to FIG. 11G, the Create Default Text submodule 402allows the user to convert a text file on the clipboard into a list ofutterance names. Create default text 402 creates an utterance name foreach unique word in the clipboard 474, and then returns 476. Theutterance names are associated with the keyboard entries which will typeout the name. For example, a business letter can be copied from theclipboard into default text. Utterances would then be associated witheach of the common business terms in the letter. After ten or twelvebusiness letters have been converted the majority of the business letterwords would be stored as a set of utterances.

[0108] Referring to FIG. 11H, the Alphabetize Group submodule 404 allowsthe user to alphabetize the utterance names in a language. The selectedgroup of names (created by dragging the mouse over utterance names inthe Language Maker window) is alphabetized 478, and then AlphabetizeGroup returns 480.

[0109] Referring to FIG. 11I, the Preferences submodule 406 allows theuser to select standard graphic user interface preferences such as fontstyle 482 and font size 484. The Preferences submenu 486 allows the userto state the metric by which mouse locations of recorded actions arestored. The coordinates for mouse actions can be relative to the globalwindow coordinates or relative to the application window coordinates. Inthe case where application menu selections are performed by mouseclicks, the mouse clicks must always be in relative coordinates so thatthe window may be moved on the screen without affecting the function ofthe mouse click. The Preferences submenu 486 also determines whether,when a mouse action is recorded, the mouse is left at the location of aclick or returned to its original location after a click. When thepreference selections are done 488, the user is prompted whether hewants to update the current preference settings for Language Maker. Ifso, the file is updated 490 and Preferences returns 492. If not,Preferences returns directly to the operating system 494 without saving.

[0110] Referring to FIG. 12, the Write Production module 242 is calledwhen a file is saved. Write Production saves the current language andconverts it from an outline processor format such as that used in theLanguage Maker application to a hierarchical text format suitable foruse with the state machine based Recognition Software. Language filesare associated with applications and new language files can be createdor edited for each additional application to incorporate the variouscommands of the application into voice recognition.

[0111] The embodiment of the Write Production module depends upon theRecognition Software in use. In general, the Write Production module iswritten to convert the current language to suitable format for theRecognition Software in use. The particular embodiment of WriteProduction shown in FIG. 12 applies to the syntax of the VOCAL compilerfor the Dragon Systems Recognition Software.

[0112] Write Production first tests the language 494 to determine ifthere are any sub-levels. If not, the Write Terminal submodule 496 savesthe top level language, and Write Production returns 498. If sub-levelsexist in the language, then each sub-level is processed by atail-recursive loop. If a root entry exists in the language 500 (i.e. ifonly one utterance name exists at the current level) then WriteProduction writes 502 the string “Root=(” to the file, and checks forsub-levels 512. Otherwise, if no root exists, Write Terminal is called504 to save the names in the current level of the language. Next, thestring “TERMINAL =” is written 506, and if, 508, the language level isterminal, the string “(“is written. Next, Write Production checks 512for sub-levels in the language. If no sub-levels exist, Write Productionreturns 514. Otherwise, the sub-levels are processed by another call 516to Write Production on the sub-level of the language. After thesub-level is processed, Write Production writes the string”)” andreturns 518.

[0113] Referring to FIG. 13, the Write Terminal submodule 496 writeseach utterance name and the associated command string to the languagefile. First, Write Terminal checks 520 if it is at a terminal. If not,it returns 530. Otherwise, Write Terminal writes 522 the stringcorresponding to the utterance name to the language file. Next, if, 524,there is an associated command string, Write Terminal writes the commandstring (i.e. “output”) to the language file. Finally, Write Terminalwrites 528 the string “;” to the language file and returns 530.

Voice Control

[0114] The Voice Control software serves as a gate between the operatingsystem and the applications running on the operating system. This isaccomplished by setting the Macintosh operating system's get_next_eventprocedure equal to a filter procedure created by Voice Control. Theget_next_event procedure runs when each next_event request is generatedby the operating system or by applications. Ordinarily theget_next_event procedure is null, and next_event requests go directly tothe operating system. The filter procedure passes control to VoiceControl on every request. This allows Voice Control to perform voiceactions by intercepting mouse and keyboard events, and create new eventscorresponding to spoken commands.

[0115] The Voice Control filter procedure is shown in FIG. 14.

[0116] After installation 538, the get_next_event filter procedure 540is called before an event is generated by the operating system. Theevent is first checked 54Z to see if it is a null event. If so, theProcess Input module 544 is called directly. The Process Input routine544 checks for new speech input and processes any that has beenreceived. After Process Input, the Voice Control driver proceeds throughnormal filter processing 546 (i.e., any filter processing caused byother applications) and returns 548. If the next event is not a nullevent, then displays are hidden 550. This allows Voice Control to hideany Voice Control displays (such as current language lists) which couldhave been generated by a previous non-null action. Therefore, if anyprompt windows have been produced by Voice Control, when a non-nullevent occurs, the prompt windows are hidden. Next, key down events arechecked 552. Because the recognizer is controlled (i.e. turned on andoff) by certain special key down events, if the event is a key downevent then Voice Control must do further processing. Otherwise, theVoice Control drive procedure moves directly to Process Input 544. If akey down event has occurred 554, where appropriate, software latcheswhich control the recognizer are set. This allows activation of theRecognizer Software, the selection of Recognizer options, or the displayof languages. Thereafter, the Voice Control driver moves to ProcessInput 544.

[0117] Referring to FIG. 15, the Process Input routine is the heart ofthe Voice Control driver. It manages all voice input for the VoiceNavigator. The Process Input module is called each time an event isprocessed by the operating system. First 546, any latches which need tobe set are processed, and the Macintosh waits for a number of delayticks, if necessary. Delay ticks are included, for example, where a menudrag is being performed by Voice Control, to allow the menu to be drawnon the screen before starting the drag. Also, some applications requiredelay between mouse or keyboard events. Next, if recognition isactivated 548 the process input routine proceeds to do recognition 562.If recognition is deactivated, Process Input returns 560.

[0118] The recognition routine 562 prompts the recognition drivers tocheck for an utterance (i.e., sound that could be speech input). Ifthere is recognized speech input 564, Process Input checks the verticalblanking interrupt VBL handler 566, and deactivates it whereappropriate.

[0119] The vertical blanking interrupt cycle is a very low level cyclein the operating system. Every time the screen is refreshed, as theraster is moving from the bottom right to the top left of the screen,the vertical blanking interrupt time occurs. During this blanking time,very short and very high priority routines can be executed. The cycle isused by the Process Input routine to move the mouse continuously by veryslowly incrementing of the mouse coordinates where appropriate. Toaccomplish this, mouse move events are installed onto the VBL queue.Therefore, where appropriate, the VBL handler must be deactivated tomove the mouse.

[0120] Other speech input is placed 568 on a speech queue, which storesspeech related events for the processor until they can be handled by theProcessQ routine. However, regardless of whether speech is recognized,ProcessQ 570 is always called by Process Input. Therefore, the speechevents queued to ProcessQ are eventually executed, but not necessarilyin the same Process Input cycle. After calling ProcessQ, Process Inputreturns 571.

[0121] Referring to FIG. 16, the Recognize submodule 562 checks forencoded utterances queued by the Voice Navigator box, and then calls therecognition drivers to attempt to recognize any utterances. Recognizereturns the number of commands in (i.e. the length of) the commandstring returned from the recognizer. If, 572, no utterance is returnedfrom the recognizer, then Recognize returns a length of zero (574),indicating no recognition has occurred. If an utterance is available,then Recognize calls sdi_recognize 576, instructing the RecognizerSoftware to attempt recognition on the utterance. If, 578, recognitionis successful, then the name of the utterance is displayed 582 to theuser. At the same time, any close call windows (i.e. windows associatedwith close call choices, prompted by Voice Control in response to theRecognizer Software) are cleared from the display. If recognition isunsuccessful, the Macintosh beeps 580 and zero length is returned 574.

[0122] If recognition is successful, Recognize searches 584 for anoutput string associated with the utterance. If there is an outputstring, recognize checks if it is asleep 586. If it is not asleep 590,the output count is set to the length of the output string and, if thecommand is a control command 592 (such as “go to sleep” or “wake up”),it is handled by the Process voice Commands routine 594.

[0123] If there is no output string for the recognized utterance, or ifthe recognizer is asleep, then the output of Recognize is zero (588).After the output count is determined 596, the state of the recognizer isprocessed 596. At this time, if the Voice Control state flags have beenmodified by any of the Recognize subroutines, the appropriate actionsare initialized. Finally, Recognize returns 598.

[0124] Referring to FIG. 17, the Process Voice Commands module dealswith commands that control the recognizer. The module may performactions, or may flag actions to be performed by the Process States block596 (FIG. 16). If the recognizer is put to sleep 600 or awakened 604,the appropriate flags are set 602, 606, and zero is returned 626, 628for the length of the command string, indicating to Process States totake no further actions. Otherwise, if the command is scratch_that 608(ignore last utterance), first_level 612 (go to top of languagehierarchy, i.e. set the Voice Control state to the root state for thelanguage), word_list 616 (show the current language), or voice options620, the appropriate flags are set and 610, 614, 618, 622, and a stringlength of −1 is returned 624, 628, indicating that the recognizer stateshould be changed by Process States 596 (FIG. 16).

[0125] Referring to FIG. 18 the ProcessQ module 570 pulls speech inputfrom the speech queue and processes it. If, 630, the event queue isempty then ProcessQ may proceed, otherwise ProcessQ aborts 632 becausethe event queue may overflow if speech events are placed on the queuealong with other events. If, 634, the speech queue has any events thenprocess queue checks to see if, 636, delay ticks for menu drawing orother related activities have expired. If no events are on the speechqueue the ProcessQ aborts 636. If delay ticks have expired, thenProcessQ calls Get Next 642 and returns 644. Otherwise, if delay tickshave not expired, ProcessQ aborts 640.

[0126] Referring to FIG. 19, the Get Next submodule 642 gets charactersfrom the speech queue and processes them. If, 646, there are nocharacters in the speech queue then the procedure simply returns 648. Ifthere are characters in the speech queue then Get Next checks 650 to seeif the characters are command characters. If they are, then Get Nextcalls Check Command 660. If not, then the characters are text, and GetNext sets the meta bits 652 where appropriate.

[0127] When the Macintosh posts an event, the meta bits (see Appendix B)are used as flags for conditioning keystrokes such as the condition key,the option key, or the command key. These keys condition the characterpressed at the keyboard and create control characters. To create theproper operating system events, therefore, the meta bits must be setwhere necessary. Once the meta bits are set 652, a key down event isposted 654 to the Macintosh event queue, simulating a keypush at thekeyboard. Following this, a key up is posted 656 to the event queue,simulating a key up. If, 658, there is still room in the event queue,then further speech characters are obtained and processed 646. If not,then the Get Next procedure returns 676.

[0128] If the command string input corresponds to a command rather thansimple key strokes, the string is handled by the Check Command procedure660 as illustrated in FIG. 19. In the Check Command procedure 660 thenext four characters from the speech queue (four characters is thelength of all command strings, see Appendix A) are fetched 662 andcompared 664 to a command table. If, 666, the characters equal a voicecommand, then a command is recognized, and processing is continued bythe Handle Command routine 668. Otherwise, the characters areinterpreted as text and processing returns to the meta bits step 652.

[0129] In the Handle Command procedure 668 each command is referencedinto a table of command procedures by first computing 670 the commandhandler offset into the table and then referencing the table, andcalling the appropriate command handler 672. After calling theappropriate command handler, Get Next exits the Process Input moduledirectly 674 (the structure of the software is such that a return fromHandle Command would return to the meta bits step 652, which would beincorrect).

[0130] The command handlers available to the Handle Command routine areillustrated in FIG. 20. Each command handler is detailed by a flowdiagram in FIGS. 21A through 21G. The syntax for the commands isdetailed in Appendix A.

[0131] Referring to FIG. 21A, the Menu command will pull down a menu,for example, @MENU(apple,0) (where apple is the menu number for theapple menu) will pull down the apple menu. Menu command will also selectan item from the menu, for example, @MENU(apple,calculator) (wherecalculator is the item number for the calculator in the apple menu) willselect the calculator from the apple menu. Menu command initializes byrunning the Find Menu routine 678 which queues the menu id and the itemnumber for the selected menu. (If the item number in the menu is 0 thenFind Menu simply clicks on the menu bar.) After Find Menu returns, if680, there are no menus queued for posting, the Menu command simplyreturns 690. However, if menus are queued for posting, Menu commandintercepts 682 one of the Macintosh internal traps called Menu Select.The Menu Select trap is set equal to the My Menu Select routine 692.Next the cursor coordinates are hidden 684 so that the mouse cannot beseen as it moves on the screen. Next, Menu command posts 686 a mousedown (i.e. pushes the mouse button down) on the menu bar. When the mousedown occurs on the menu bar the Macintosh operating system generates amenu event for the application. Each application receiving a menu eventrequests service from the operating system to find out what the menuevent is. To do this the application issues a Menu Select trap. The menuselect trap then places the location of the mouse on the stack. However,when the application issues a menu select trap in this case, it isserviced by the My Menu Select routine 692 instead, thereby allowingMenu command to insert the desired menu coordinates in the place of thereal coordinates. After posting a mouse down in the appropriate menubar, Menu Command sets 688 the wait ticks to 30, which gives theoperating system time to draw the menu, and returns 690.

[0132] In the My Menu Select trap 692 the menuselect global state isreset 694 to clear any previously selected menus, and the desired menuid and the item number are moved to the Macintosh stack 696, thusselecting the desired menu item.

[0133] The Find Menu routine 700 collects 702 the command parameters forthe desired menu. Next, the menuname is compared 704 to the menu namelist. If, 706, there is no menu with the name “menuname”, Find Menuexits 708. Otherwise, Find Menu compares 710 the itemname to the namesof the items in the menu. If, 712, the located item number is greaterthan 0, then Find Menu queues 718 the menu id and item number f or useby Menu command, and returns 720. Otherwise, if the item number is 0then Find Menu simply sets 714 the internal Voice Control flags“mousedown” and “global” flags to true. This indicates to Voice Controlthat the mouse location should be globally referenced, and that themouse button should be held down. Then Find Menu calls 716 the PostMouse routine, which references these flags to manipulate the operatingsystem's mouse state accordingly.

[0134] Referring to FIG. 21B, the Control command 722 performs a buttonpush within a menu, invoking actions such as the save command in thefile menu of an application. To do this, the control command gets thecommand parameters 724 from the control string, finds the front window726, gets the window command list 728, and checks 730 if the controlname exists in the control list. If the control name does exist in thecontrol list then the control rectangle coordinates are calculated 732,the Post Mouse routine 734 clicks the mouse in the proper coordinates,and the Control command returns 736. If the control name is not found,the Control command returns directly.

[0135] The Keypad command 738 simulates numerical entries at theMacintosh keypad. Keypad finds the command parameters for the commandstring 740, gets the keycode value 742 for the desired key, posts a keydown event 744 to the Macintosh event queue, and returns 746.

[0136] The Zoom command 748 zooms the front window. Zoom obtains thefront window pointer 750 in order to reference the mouse to the frontwindow, calculates the location of the zoom box 752, uses Post Mouse toclick in the zoom box 754, and returns 756.

[0137] The Local Mouse command 758 clicks the mouse at a locallyreferenced location. Local Mouse obtains the command parameters for thedesired mouse location 760, uses Post Mouse to click at the desiredcoordinate 762, and returns 764.

[0138] The Global Mouse command 766 clicks the mouse at a globallyreferenced location. Global Mouse obtains the command parameters for thedesired mouse location 768, sets the global flag to true 770 (to signalto Post Mouse that the coordinates are global), uses Post Mouse to clickat the desired coordinate 772, and returns 774.

[0139] The Double Click command double clicks the mouse at a locallyreferenced location. Double Click obtains the command parameters for thedesired mouse location 778, calls Post Mouse twice 780, 782 (to clicktwice in the desired location), and returns 784.

[0140] The Mouse Down command 786 sets the mouse button down. Mouse Downsets the mousedown flag to true 788 (to signal to Post Mouse that mousebutton should be held down), uses Post Mouse to set the button down 790,and returns 792.

[0141] The Mouse Up command 794 sets the mouse button up. Mouse Up setsthe mbState global (see Appendix B) to Mouse Button UP 796 (to signal tothe operating system that mouse button should be set up), posts a mouseup event to the Macintosh event queue 798 (to signal to applicationsthat the mouse button has gone up), and returns 800.

[0142] Referring to FIG. 21D, the Screen Down command 802 scrolls thecontents of the current window down. Screen Down first looks 804 for thevertical scroll bat in the front window. If, 806, the scroll bar is notfound, Screen Down simply returns 814. If the scroll bar is found,Screen Down calculates the coordinates of the down arrow 808, sets themousedown flag to true 810 (indicating to Post Mouse that the mousebutton should be held down), uses Post Mouse to set the mouse buttondown 812, and returns 814.

[0143] The Screen Up command 816 scrolls the contents of the currentwindow up. Screen Up first looks 818 for the vertical scroll bar in thefront window. If, 820, the scroll bar is not found, Screen Up simplyreturns 828. If the scroll bar is found, Screen Up calculates thecoordinates of the up arrow 822, sets the mousedown flag to true 824(indicating to Post Mouse that the mouse button should be held down),uses Post Mouse to set the mouse button down 826, and returns 828.

[0144] The Screen Left command 830 scrolls the contents of the currentwindow left. Screen Left first looks 832 for the horizontal scroll barin the front window. If, 834, the scroll bar is not found, Screen Leftsimply returns 842. If the scroll bar is found, Screen Left calculatesthe coordinates of the left arrow 836, sets the mousedown flag to true838 (indicating to Post Mouse that the mouse button should be helddown), uses Post Mouse to set the mouse button down 840, and returns842.

[0145] The Screen Right command 84 scrolls the contents of the currentwindow right. Screen Right first looks 846 for the horizontal scroll barin the front window. If, 848, the scroll bar is not found, Screen Rightsimply returns 856. If the scroll bar is found, Screen Right calculatesthe coordinates of the right arrow 850, sets the mousedown flag to true852 (indicating to Post Mouse that the mouse button should be set down),uses Post Mouse to set the mouse button down 854, and returns 856.

[0146] Referring to FIG. 21E, the Page Down command 858 moves thecontents of the current window down a page. Page Down first looks 860for the vertical scroll bar in the front window. If, 862, the scroll baris not found, Page Down simply returns 868. If the scroll bar is found,Page Down calculates the page down button coordinates 864, uses PostMouse to click the mouse button down 866, and returns 868.

[0147] The Page Up command 870 moves the contents of the current windowup a page. Page Up first looks 872 for the vertical scroll bar in thefront window. If, 874, the scroll bar is not found, Page Up simplyreturns 880. If the scroll bar is found, Page Up calculates the page upbutton coordinates 876, uses Post Mouse to click the mouse button down878, and returns 880.

[0148] The Page Left command 882 moves the contents of the currentwindow left a page. Page Left first looks 884 for the horizontal scrollbar in the front window. If, 886, the scroll bar is not found, Page Leftsimply returns 892. If the scroll bar is found, Page Left calculates thepage left button coordinates 888, uses Post Mouse to click the mousebutton down 890, and returns 892.

[0149] The Page Right command 894 moves the contents of the currentwindow right a page. Page Right first looks 896 for the horizontalscroll bar in the front window. If, 898, the scroll bar is not found,Page Right simply returns 904. If the scroll bar is found, Page Rightcalculates the page right button coordinates 900, uses Post Mouse toclick the mouse button down 902, and returns 904.

[0150] Referring to FIG. 21F, the Move command 906 moves the mouse fromits current location (y,x), to a new location (y+δy,x+δx). First, Movegets the command parameters 908, then Move sets the mouse speed totablet 910 (this cancels the mouse acceleration, which otherwise wouldmake mouse movements uncontrollable), adds the offset parameters to thecurrent mouse location 912, forces a new cursor position and resets themouse speed 914, and returns 916.

[0151] The Move to Global Coordinate command 918 moves the cursor to theglobal coordinates given by the Voice Control command string. First,Move to Global gets the command parameters 920, then Move to Globalchecks 922 if there is a position parameter. If there is a positionparameter, the screen position coordinates are fetched 924. In eithercase, the global coordinates are calculated 926, the mouse speed is setto tablet 928, the mouse position is set to the new coordinates 930, thecursor is forced to the new position 932, and Move to Global returns934.

[0152] The Move to Local Coordinate command 936 moves the cursor to thelocal coordinates given by the Voice Control command string. First, Moveto Local gets the command parameters 938, then Move to Local checks 940if there is a position parameter. If there is a position parameter, thelocal position coordinates are fetched 942. In either case, the globalcoordinates are calculated 944, the mouse speed is set to tablet 946,the mouse position is set to the new coordinates 948, the cursor isforced to the new position 950, and Move to Global returns 952.

[0153] The Move Continuous command 954 moves the mouse continuously fromits present location, moving δy,δx every refresh of the screen. This isaccomplished by inserting 956 the VBL Move routine 960 in the VerticalBlanking Interrupt queue of the Macintosh and returning 958. Once in thequeue, the VBL Move routine 960 will be executed every screen refresh.The VBL Move routine simply adds the δy and δx values to the currentcursor position 962, resets the cursor 964, and returns 966.

[0154] Referring to FIG. 21G, the Option Key Down command 968 sets theoption key down. This is done by setting the option key bit in thekeyboard bit map to TRUE 970, and returning 972.

[0155] The Option Key Up command 974 sets the option key up. This isdone by setting the option key bit in the keyboard bit map to FALSE 976,and returning 978.

[0156] The Shift Key Down command 980 sets the shift key down. This isdone by setting the shift key bit in the keyboard bit map to TRUE 982,and returning 984.

[0157] The Shift Key Up command 986 sets the shift key up. This is doneby setting the shift key bit in the keyboard bit map to FALSE 988, andreturning 990.

[0158] The Command Key Down command 992 sets the command key down. Thisis done by setting the command key bit in the keyboard bit map to TRUE994, and returning 996.

[0159] The Command Key Up command 998 sets the command key up. This isdone by setting the command key bit in the keyboard bit map to FALSE1000, and returning 1002.

[0160] The Control Key Down command 1004 sets the control key down. Thisis done by setting the control key bit in the keyboard bit map to TRUE1006, and returning 1008.

[0161] The Control Key Up command 1010 sets the control key up. This isdone by setting the control key bit in the keyboard bit map to FALSE1012, and returning 1014.

[0162] The Next Window command 1016 moves the front window to the back.This is done by getting the front window 1018 and sending it to the back1020, and returning 1022.

[0163] The Erase command 1024 erases numchars characters from thescreen. The number of characters typed by the most recent voice commandis stored by Voice Control. Therefore, Erase will erase the charactersfrom the most recent voice command. This is done by a loop which postsdelete key keydown events 1026 and checks 1028 if the number postedequals numchars. When numchars deletes have been posted, Erase returns1030.

[0164] The Capitalize command 1032 capitalizes the next keystroke. Thisis done by setting the caps flag to TRUE 1034, and returning 1036.

[0165] The Launch command 1038 launches an application. The applicationmust be on the boot drive no more than one level deep. This is done bygetting the name of the application 1040 (“appl_name”), searching forappl_name on the boot volume 1042, and, if, 1044, the application isfound, setting the volume to the application folder 1048, launching theapplication 1050 (no return is necessary because the new applicationwill clear the Macintosh queue). If the application is not found, Launchsimply returns 1046.

[0166] Referring to FIG. 22, the Post Mouse routine 1052 posts mousedown events to the Macintosh event queue and can set traps to monitormouse activity and to keep the mouse down. The actions of Post Mouse aredetermined by the Voice Control flags global and mousedown, which areset by command handlers before calling Post Mouse. After a Post Mouse,when an application does a get_next_event it will see a mouse down eventin the event queue, leading to events such as clicks, mouse downs ordouble clicks.

[0167] First, Post Mouse saves the current mouse location 1054 so thatthe mouse may be returned to its initial location after the mouse eventsare produced. Next the cursor is hidden 1056 to shield the user fromseeing the mouse moving around the screen. Next the global flag ischecked. If, 1058, the coordinates are local (i.e. global=FALSE) thenthey are converted 1060 to global coordinates. Next, the mouse speed isset to tablet 1062 (to avoid acceleration problems), and the mouse downis posted to the Macintosh event queue 1064. If, 1066, the mousedownflag is TRUE (i.e. if the mouse button should be held down) then the setMouse Down routine is called 1072 and Post Mouse returns 1070.Otherwise, if the mouse down flag is FALSE, then a click is created byposting a mouse up event to the Macintosh event queue 1068 and returning1070.

[0168] Referring to FIG. 23, the Set Mouse Down routine 1072 holds themouse button down by replacing 1074 the Macintosh button trap with aVoice Control trap named My Button. The My Button trap then recognizesfurther voice commands and creates mouse drags or clicks as appropriate.After initializing My Button, Set Mouse Down checks 1076 if theMacintosh is a Macintosh Plus, in which case the Post Event trap mustalso be reset 1078 to the Voice Control My Post Event trap. (TheMacintosh Plus will not simply check the mbState global flag todetermine the mouse button state. Rather, the Post Event trap in aMacintosh Plus will poll the actual mouse button to determine its state,and will post mouse up events if the mouse button is up. Therefore, toforce the Macintosh Plus to accept the mouse button state as dictated byVoice Control, during voice actions, the Post Event trap is replacedwith a My Post Event trap, which will not poll the status of the mousebutton.) Next, the mbstate flag is set to MouseDown 1080 (indicatingthat the mouse button is down) and Set Mouse Down returns 1082.

[0169] The My Button trap 1084 replaces the Macintosh button trap,thereby seizing control of the button state from the operating system.Each time My Button is called, it checks 1086 the Macintosh mouse buttonstate bit mbstate. If mbState has been set to UP, My Button moves to theEnd Button routine 1106 which sets mbstate to UP 1108, removes any VBLroutine which has been installed 1110, resets the Button and Post Eventtraps to the original Macintosh traps 1112, resets the mouse speed andcouples the cursor to the mouse 1114, shows the cursor 1102, and returns1104.

[0170] However, if the mouse button is to remain down, My Button checksfor the expiration of wait ticks (which allow the Macintosh time to drawmenus on the screen) 1088, and calls the recognize routine 1090 torecognize further speech commands. After further speech commands arerecognized, My Button determines 1092 its next action based on thelength of the command string. If the command string length is less thanzero, then the next voice command was a Voice Control internal command,and the mouse button is released by calling End Button 1106. If thecommand string length is greater than zero, then a command wasrecognized, and the command is queued onto the voice que 1094, and thevoice queue is checked for further commands 1096. If nothing wasrecognized (command string length of zero), then My Button skipsdirectly to checking the voice queue 1096. If there is nothing in thevoice queue, then My Button returns 1104. However, if there is a commandin the voice queue, then My Button checks 1098 if the command is a mousemovement command (which would cause a mouse drag). If it is not a mousemovement, then the mouse button is released by calling End Button 1106.If the command is a mouse movement, then the command is executed 1100(which drags the mouse), the cursor is displayed 1102, and My Buttonreturns.

Screen Displays

[0171] Referring to FIG. 24, a screen display of a record actionssession is shown. The user is recording a local mouse click 1106, andthe click is being acknowledged in the action list 1108 and in theaction window 1110.

[0172] Referring to FIG. 25, a record actions session using dialog boxesis shown. The dialog boxes 1112 for recording a manual printer feed aredisplayed to the user, as well as the Voice Control Run Modal dialog box1114 prompting the user to record the dialogs. The user is preparing torecord a click on the Manual Feed button 1116.

[0173] Referring to FIG. 26, the Language Maker menu 1118 is shown.

[0174] Referring to FIG. 27, the user has requested the currentlanguage, which is displayed by Voice Control in a pop-up display 1120.

[0175] Referring to FIG. 28, the user has clicked on the utterance name“apple” 1122, requesting a retraining of the utterance for “apple”.Voice Control has responded with a dialog box 1124 asking the user tosay “apple” twice into the microphone.

[0176] Referring to FIG. 29, the text format of a Write Productionoutput file 1126 (to be compiled by VOCAL) and the correspondingLanguage Maker display for the file 1128 are shown. It is clear fromFIG. 29 that the Language Maker display is far more intuitive.

[0177] Referring to FIG. 30, a listing of the Write Production outputfile as displayed in FIG. 29 is provided.

Other Embodiments

[0178] Other embodiments of the invention are within the scope of theclaims which follow the appendices. For example, the graphic userinterface controlled by a voice recognition system could be other thanthat of the Apple Macintosh computer. The recognizer could be other thanthat marketed by Dragon Systems.

[0179] Included in the Appendices are Appendix A, which sets forth theVoice Control command language syntax, Appendix B, which lists some ofthe Macintosh OS globals used by the Voice Navigator system, Appendix C,which is a fiche of the Voice Navigator executable code, Appendix D,which is the Developer's Reference Manual for the voice Navigatorsystem, and Appendix E, which is the Voice Navigator User's Manual, allincorporated by reference herein.

[0180] A portion of the disclosure of this patent document containsmaterial which is subject to copyright protection (for example, themicrofiche Appendix, the User's Manual, and the Reference Manual). Thecopyright owner has no objection to the facsimile reproduction by anyoneof the patent document or patent disclosure, as it appears in the Patentand Trademark Office patent file or records, but otherwise reserves allcopyright rights whatsoever.

Appendix A: Voice Control Command Language Syntax

[0181] Menu Command—@MENU(menuname,itemnum).

[0182] Finds item named itemnum in the menu named menuname and selectsit. If itemnum is 0, hold the menu down.

[0183] Control Command—@CTRL(ctlname)

[0184] Finds the control named ctlname and clicks in its rectangle.

[0185] Key Pad Command—@KYPD(n), where n=0-9, −, +, *, /, =, and c forclear

[0186] Posts a Keydown for keys on the numeric keypad.

[0187] Zoom Command—@ZOOM

[0188] Clicks in the zoom box of the front window.

[0189] Local Mouse Click Command—@LMSE(y,x)

[0190] Clicks at local coordinates (y,x) of the front window.

[0191] Global Mouse Click Command—@GMSE(y,x)

[0192] Clicks at the global coordinates (y,x) of the current screen.

[0193] Double Click Command—@DCLK(y,x)

[0194] Double clicks at the global coordinates (y,x) of the currentscreen. If y=x=0, double click at the current Mouse location.

[0195] Mouse Down Command—@MSDN

[0196] Set the mouse button state to down and set up traps to keep itdown.

[0197] Mouse Up Command—@MSUP

[0198] Set the mouse button state to up.

[0199] Scroll Down Command—@SCDN

[0200] Post a mouse down in the down arrow portion of the front window'sscroll bar.

[0201] Scroll Up Command—@SCUP

[0202] Post a mouse down in the up arrow portion of the front window'sscroll bar.

[0203] Scroll Left Command—@SCUP

[0204] Post a mouse down in the left arrow portion of the front window'sscroll bar.

[0205] Scroll Right Command—@SCRT

[0206] Post a mouse down in the right arrow portion of the frontwindow's scroll bar.

[0207] Page Down Command—@PGDN

[0208] Click in the page down portion of the front window's scroll bar.

[0209] Page Up Command—@PGUP

[0210] Click in the page up portion of the front window's scroll bar.

[0211] Pare Left Command—@PGLF

[0212] Click in the page left portion of the front window's scroll bar.

[0213] Page Right Command—@PGRT

[0214] Click in the page right portion of the front window's scroll bar.

[0215] Move Command—@MOVE(δy,δx)

[0216] Move the Mouse from its current location (y,x), to a new location(y+δy,x+δx) where δy and δx are pixels and can be either positive ornegative values.

[0217] Move Continuous Command—MOVI(δy,δx)

[0218] Move the mouse continuously from its present location, movingδy,δx every refresh of the screen.

[0219] Move to Local Coordinate Command—MOVL(y,x<,windowname>) or

[0220] MOVL(n<,y,x<,windowname>> where n=N,S,E,W,NE,SE,SW,NW,C,G

[0221] Move the cursor to the local coordinates given by (y,x) or by(n.v+y,n.h+x). Use the grafPort of the window named “windowname”. Ifthere is no “windowname” use the grafPort of the front window.

[0222] Move to Global Coordinate Command—@MOVG(n,<y,x>)

[0223] where n=N,S,E,W,NE,SE,SW,NW,C,G

[0224] move the cursor to the global coordinates given by (y,x) or by(n.v+y,n.h+x). Use the grafport of the screen.

[0225] Option Key Down Command—@OPTD

[0226] Press (and hold) the option key.

[0227] Option Key Up Command—@OPTU

[0228] Release the option key.

[0229] Shift Key Down Command—@SHFD

[0230] Press (and hold) the shift key.

[0231] Shift Key Up Command—@SHFU

[0232] Release the shift key.

[0233] Command Key Down Command—@CMDD

[0234] Press (and hold) the command key.

[0235] Command Key Up Command—@CMDU

[0236] Release the command key.

[0237] Control Key Down Command—@CTLD

[0238] Press (and hold) the control key.

[0239] Control Key Up Command—@CTLU

[0240] Release the control key.

[0241] Next Window Command—@NEXT

[0242] Sends the front window to the back.

[0243] Erase Command—@ERAS

[0244] Erase the last numChars typed.

[0245] Capitalize Command—@CAPS

[0246] Capitalize the next letter typed.

[0247] Launch Command—@LAUN(application₁₃ name)

[0248] Launch the application named application_name. The applicationmust be on the boot drive no more than one level deep.

[0249] Wait Command—@WAIT(nnn)

[0250] Wait for nnn ticks to elapse before doing anything else inrecognition.

Appendix B: Macintosh OS Globals

[0251] Interfacing to the Macintosh Operating System requires thatcertain low memory globals be managed by Voice Control. The followingdescribes the most important globals. Further information is availablein “Inside Macintosh”, Vols. I-V.

Mouse Globals

[0252] MickeyBytes EQU $D6A—a pointer to the cursor value; used tocontrol the acceleration of the mouse. Set to point to tablet wheneverthe mouse is moved more than 10 pixels. [pointer]

[0253] MTemp EQU $828—a low-level interrupt mouse location; used to movethe mouse during VBL handling while executing a @MOVI command. [long]

[0254] Mouse EQU $830—the processed mouse coordinate; used to move themouse for all other @MOVX commands. [long]

[0255] MBState EQU $172—current mouse button state; used to set theMouseDown for @MSDN and for @MENU when itemname —0. [byte]

Keyboard Globals

[0256] KeyMap EQU $174—keyboard bit map, with one bit mapped to each keyon the keyboard. Set the bit to TRUE to set the Meta keys (option,command, shift, control) down. [2 longs]

Filter Globals

[0257] JGNEFilter EQU $29A—Get Next Event filter proc; set to VoiceControl's main loop to intercept calls to Get Next Event. [pointer]

Event Queue Globals

[0258] evtMax EQU $1E—maximum number of events in the event queue. Whenthis number is reached, stop Posting events.

[0259] EventQueue EQU $14A—event queue header, the location of theMacintosh event queue. [10 bytes]

Time Globals

[0260] Ticks EQU $16A—Tick count, time since boot. Used to measureelapsed time between Voice Control actions. [long]

Cursor Globals

[0261] CrsrCouple EQU $8CF—cursor coupled to mouse? Used to disconnectcursor when doing remote clicks with @LMSE and @GMSE. [byte]

[0262] CrsrNew EQU $8CE—Cursor changed? Force a new cursor after movingthe cursor. [byte]

Menu Globals

[0263] MenuList EQU $A1 Current menuBar list structure. This handle canbe de-referenced to find all the menus associated with an application.Use for @MENU commands [handle]

Window Globals

[0264] WindowList EQU $9D6—Z-ordered linked list of windows. Thispointer will lead to a chain of all existing windows for an application.Use to find a window queue for all local commands. [pointer]

Window Offsets

[0265] These values are offsets within the window records that describecharacteristics of the window. Once a window is located, these offsetsare used to calculate:

[0266] thePort EQU 0—GrafPtr; local coordinates for @LMSE and @MOVLcommands.

[0267] portRect EQU $10—port's rectangle [rect]; window relative formsof the @MOVL command.

[0268] controlList EQU 140—used to find the controls associated with awindow.

[0269] contrlTitle EQU 40—used to compare control Titles for @CTRLcommands. contrlRect EQU 8—used to calculate the click locations in acontrol.

[0270] nextwindow EQU 144—used to locate the next window for the @NEXTcommand.

1. A system for enabling voiced utterances to be substituted formanipulation of a pointing device, the pointing device being of the kindwhich is manipulated to control motion of a cursor on a computer displayand to indicate desired actions associated with the position of thecursor on the display, the cursor being moved and the desired actionsbeing aided by an operating system in the computer in response tocontrol signals received from the pointing device, the computer alsohaving an alphanumeric keyboard, the operating system being separatelyresponsive to control signals received from the keyboard in accordancewith a predetermined format specific to the keyboard, the systemcomprising a voice recognizer for recognizing a voiced utterance, and aninterpreter for converting the voiced utterance into control signalswhich will directly create a desired action aided by the operatingsystem in the computer without first being converted into controlsignals expressed in the predetermined format specific to the keyboard.2. A method for converting voiced utterances to commands, expressed in apredefined command language, to be used by an operating system of acomputer, comprising converting some voiced utterances into commandscorresponding to actions to be taken by said operating system, andconverting other voiced utterances into commands which carry associatedtext strings to be used as part of text being processed in anapplication program running under said operating system.
 3. A method ofgenerating a table for aiding the conversion of voiced utterances tocommands for use in controlling an operating system of a computer toachieve desired actions in an application program running under theoperating system, said application program including menus and controlbuttons, said method comprising parsing the instruction sequence of theapplication program to identify menu entries and control buttons, andincluding in said table an entry for each menu entry and control buttonfound in said application program, each said entry containing a controlcommand corresponding to said menu entry or control button.
 4. A methodof enabling a user to create an instance in a formal language of thekind which has a strictly defined syntax, comprising providing agraphically displayed list of entries which are expressed in a naturallanguage and which do not comply with said syntax, permitting the userto point to an entry on said list, and automatically generating saidinstance corresponding to the identified entry in the list in responseto said pointing.